Modality-Attention Promotes the Neural Effects of Precise Timing Prediction in Early Sensory Processing

Precise timing prediction (TP) enables the brain to accurately predict the occurrence of upcoming events in millisecond timescale, which is fundamental for adaptive behaviors. The neural effect of the TP within a single sensory modality has been widely studied. However, less is known about how precise TP works when the brain is concurrently faced with multimodality sensory inputs. Modality attention (MA) is a crucial cognitive function for dealing with the overwhelming information induced by multimodality sensory inputs. Therefore, it is necessary to investigate whether and how the MA influences the neural effects of the precise TP. This study designed a visual–auditory temporal discrimination task, in which the MA was allocated to visual or auditory modality, and the TP was manipulated into no timing prediction (NTP), matched timing prediction (MTP), and violated timing prediction (VTP) conditions. Behavioral and electroencephalogram (EEG) data were recorded from 27 subjects, event-related potentials (ERP), time–frequency distributions of inter-trial coherence (ITC), and event-related spectral perturbation (ERSP) were analyzed. In the visual modality, precise TP led to N1 amplitude and 200–400 ms theta ITC variations. Such variations only emerged when the MA was attended. In auditory modality, the MTP had the largest P2 amplitude and delta ITC than other TP conditions when the MA was attended, whereas the distinctions disappeared when the MA was unattended. The results suggest that the MA promoted the neural effects of the precise TP in early sensory processing, which provides more neural evidence for better understanding the interactions between the TP and MA.


Introduction
Sub-second timing prediction (TP) enables humans to accurately predict the occurrence of upcoming events. It can speed up behaviors, facilitate perceptions, and optimize the allocations of cognitive resources effectively [1,2]. The cerebellum and basal ganglia are major neural structures responsible for timing prediction, which plays a key role in singleinterval and rhythmic timing, respectively [3,4]. Supplementary motor areas and the medial entorhinal cortex also contribute to the TP process [5][6][7][8]. Neural responses of precise TP within a single sensory modality have been widely studied [1,2]. However, less is known about how the TP works when the brain is concurrently faced with multimodality sensory inputs. Modality attention (MA) is the brain's ability to prioritize information from a specific sensory modality, which can mitigate computational burdens induced by multimodality sensory inputs [9]. Therefore, it is of vital importance to investigate whether and how the MA influences the neural effects of the precise TP.
Precise TP modulates both the pre-stimulus and evoked neural responses. Eventrelated potential (ERP) studies reported that the contingent negative variation (CNV) could index subjects' time estimation ability [10][11][12]. Neural oscillation studies highlighted that low frequency activities (<15 Hz) can represent TP. Specifically, the delta (1)(2)(3) phase reset had stronger inter-trial coherence (ITC) at the predicted moment during both rhythmic and non-rhythmic tasks [13,14]. Frontal theta (4-7 Hz) ITC was modulated by the prediction error magnitude when subjects undertook a visual temporal learning task, suggesting a close association with updating temporal information [15]. The alpha (8)(9)(10)(11)(12)(13)(14)(15) Hz) phase immediately before the visual stimulation was guided by top-down timing prediction [16,17], and alpha power changes were also found in some timing prediction studies [18,19]. Inspired by the predictive coding theory, which regards the brain as a prediction machine that can actively infer the external world and attempt to match incoming sensory inputs with top-down predictions [20][21][22], there is growing agreement that the TP is a neural implementation of the predictive coding in the time domain [23,24]. Thus, comparing the evoked responses, which were induced by the stimuli emerging just at (matched timing prediction, MTP) and not at (violated timing prediction, VTP) the predicted moment, is promising to better reflect the neural effect of the precise TP. This hypothesis was supported by the observation that N1-P2 amplitudes indexed subjective time more accurately than the CNV [25]. In our previous study, the TP was manipulated into different conditions by a visual task. In the early sensory processing stage (less than 400 ms after the target onset), the MTP conditions resulted in similar ERP profiles with the no timing prediction (NTP) conditions, whereas VTP condition suppressed N1 and enhanced N2 in the occipital brain area [26]. However, this TP neural effect was observed when there were only visual stimuli. It remains unclear whether such an opposing effect still occur when the brain is concurrently faced with audio-visual stimuli, and whether neural responses in auditory modality are similar to that of visual or not.
MA is a crucial cognitive function for dealing with the overwhelming information induced by multimodality sensory inputs. When both the auditory and visual stimulations were presented, it is possible that MA would optimize sensory processing in a specific modality. However, previous studies concentrated more on how the MA influenced the multisensory integration of time information [27,28]. Neural evidence is still lacking regarding if and how MA influences the processing of the precise TP, especially in two aspects. First, it remains controversial whether the precise TP neural effect is independent of the MA, or if it performs differently under attended and unattended conditions. For this, an EEG study concurrently manipulated the visual-tactile attention and rhythmic-based timing prediction within an experiment. TP began to work preceding the MA, and the two processes had opposing effects in modulating early evoked responses [29]. However, to the best of our knowledge, there have not been any studies investigating how auditory-visual attention modulates the neural responses of single-interval precise TP. Second, previous studies have suggested that the precise TP led to changes in either early evoked ERPs or low-frequency neural oscillations. However, it remains unclear which features may better reflect the neural effects of TP. Investigation is needed to determine how these features change when modality attention is attended or unattended.
This study investigated how audio-visual modality attention influences the neural effects of the single-interval precise TP. EEGs from 27 subjects were recorded and analyzed; ERPs, time-frequency analyses of ITC, and event-related spectral perturbation (ERSP) [30] were calculated and compared, respectively. This experiment included three TP conditions: NTP, MTP, and VTP. MA conditions included visual-attended (Va), visual-unattended (Vua), auditory-attended (Aa), auditory-unattended (Aua). We found (i) in the visual modality, the TP led to the opposing N1-N2 performance only when the MA was attended. (ii) In the auditory modality, when the MA was attended, the MTP had the largest neural responses in P2 temporal window among distinct TP conditions, and these distinctions disappeared when the MA was unattended. (iii) Low-frequency ITC could better reflect the modulations of both the TP and MA. These results suggest that the MA can promote the neural effects of precise single-interval TP in early sensory processing.

Experimental Procedure
The formal experiment took place in an electrically shielded room and included six mental tasks. As described in Figure 1c (left), the first three were visual tasks. For the first task, participants were required to indicate the onset of the second flash and if so, no specific moment was predicted. For the second task, participants had to indicate whether the second flash appeared 400 ms after first. Under this condition, 400 ms after first flash was the only predicted moment. For the third task, participants had to indicate whether the second flash appeared 600 ms after first flash, for example 600 ms after first flash was the predicted moment. Another three tasks were auditory tasks. For the first auditory task, participants were required to indicate the onset of second beep. For the second auditory task, participants had to indicate whether the second beep appeared 400 ms after first. For the third auditory task, participants had to indicate whether second beep appeared 600 ms after first flash. During auditory discrimination tasks, participants were required to maintain their sight on the visual stimuli at all the time, so that their visual inputs were completely identical to those in visual tasks. Participants made their decisions by pressing buttons with right/left thumb, which was balanced across blocks. Each task had four blocks. There were twenty-four blocks in total, all the twenty-four blocks were conducted randomly.
A precise-enough predictive template is a prerequisite for successfully manipulating the precise TP. For this reason, participants were trained for three days before the formal experiment; only when the discrimination accuracy was more than 80% could they start the formal experiment. On the first training day, participants first learnt about the three timing intervals and tried to discriminate them (i.e., TI400, TI600, and TI900), by watching the double-flash with a single timing interval, and specific TI was cued by the experimenter before each trial. After~20 min of learning, they were then asked to determine which TI the presented double flash was by pressing buttons. On the second training day, subjects participated in a visual or auditory temporal discrimination task, in which they were required to judge whether the actual double-flash/beep was TI400 or TI600. Notably, in each block, there were 10-15 trials, the timing of which was randomly selected from 300 ms, 500 ms, and 800 ms. The aim of adding these untrained TIs into the training block is to avoid the participants from realizing there were only three kinds of actual stimuli in the formal test and allocated more attentional resources to the three specific moments. On the third training day, subjects underwent the same training as the second day.
1 Figure 1. Illustration of the experimental design: (a) The general experimental process: There were six kinds of visual-auditory combinations in total, which are listed in the table. The yellow and red boxes indicate the trials selected for visual and auditory analyses, respectively. (b) An example of the detailed parameters. (c) The list of mental tasks, with 1-6 used to represent the tasks described above. These numbers did not represent the presentation order of the tasks. (d) The manipulation of modality attention (MA) and precise timing prediction (TP) in visual, and (e) auditory modality.

Experimental Design for Forming Distinct MA-TP Conditions
MA was manipulated by the task-relevance. In visual tasks, attentional resources were allocated to visual modality. For visual tasks 1-3, they were visual attended (Va) (Figure 1d upper), but auditory unattended (Aua) (Figure 1e lower). In auditory tasks, attention was allocated to auditory modality. For tasks 4-6, they were auditory attended (Aa) (Figure 1d lower), but visual unattended (Vua) (Figure 1e upper).
Distinct TP conditions were formed by the interactions between mental tasks and actual onset moment of the second stimulus. Trials containing 400 ms TI were extracted, as the yellow (for visual modality) and red (for auditory modality) boxes in Figure 1a shows. This means that the actual stimuli for analyzing were identical; but the predicted moment in subjects' minds varied with mental tasks. Specifically, in tasks 1 and 4, there was no specific oriented moment, i.e., NTP condition. In this condition, induced neural activities were the least influenced by top-down process, which can be a baseline for studying how the TP changes evoked neural responses. In tasks 2 and 5, the only predicted moment was 400 ms after first stimulation, so the actual stimulus emerged exactly at the predicted moment, i.e., MTP condition. In tasks 3 and 6, 600 ms after first stimulation was the only predicted moment, which means that the actual stimulus occurred before the predicted moment, i.e., VTP condition.

EEG Recording and Pre-Processing
EEG was recorded using a 64-electrode Neurocan Synamps2 system at a sample rate of 10,000 Hz and was notch-filtered at 50 Hz. All electrodes were positioned on the scalp according to the International 10-20 system, and were all referenced to the tip of nose and grounded to the frontal brain area. Additional bipolar electrodes registered the electro-oculogram (EOG). An independent component analysis (ICA) was used to reject eye movement artifacts. Eye-related components were identified by comparing individual ICA components with EOG channels and by visual inspection. To collect qualified EEG signals, the impedance levels of all the electrodes were less than 10 kΩ.
In pre-processing, EEG data were filtered by a FIR I low-pass filter cutting at 40 Hz and down-sampled to 200 Hz. According to the experimental design, the TP mainly worked after the second onset of TI400 double-stimulus, whereas responses to the first stimulus was almost not influenced. Therefore, the second stimulus onset was defined as the zero point. The correct trials with a reaction time less than 80 ms (relative to the zero-time point) were defined as the qualified trials; each MA-TP condition contained 35-40 qualified trials for subsequent EEG analyses in total.

Data Processing and Analyses
This study analyzed EEG from O1, OZ, and O2 electrodes for probing the evoked responses in visual modality, and EEG from F1, FZ, and F2 electrodes for auditory responses. Choosing these electrodes was based on earlier studies that investigated visual or auditory neural responses using these electrodes [26,31,32].
For ERP analyses, baseline correction was performed using a 100 ms pre-stimulus baseline for the ERPs induced by the first and second stimulus, respectively. Such baseline correction was because this study mainly investigated the evoked responses rather than the CNVs.
The ERP technique, time-frequency analyses were used to measure the evoked neural responses under distinct MA-TP conditions. In visual modality, N1 component induced by first and second flash, and N2 component induced by second flash, were selected for further analyses. According to the separations of visual ERP profiles, the temporal windows for first N1, second N1, second N2 were defined as 140-200 ms after the first flash (i.e., −240 to −200 ms relative to zero point), 120-190 ms and 200-300 ms after second flash, respectively. In auditory modality, the temporal windows for P2 component induced by the first and second beep were defined as −230 to −180 ms and 110-250 ms, respectively. The ERP amplitude was calculated as the mean amplitude within specific temporal window.
The ITC and ERSP were calculated to show the event-related neural dynamics with a time-frequency distribution. ITC measures the phase synchronization to a set of experimental events to which EEG trials are time-locked, and it values between 0 and 1. The larger an ITC value is, the stronger the phase synchronization is. The ITC can be calculated as Equation (1). Moreover, the ERSP was used to visualize event-related changes in spectral power over time, with a baseline covering 100 ms before the first stimulus. It was calculated as Equation (2) According to the inspection of the time-frequency distribution, in visual modality, three temporal windows were selected for ITC analyses (−300 to −150 ms, 100-200 ms and 200-300 ms relative to zero point, respectively). In auditory modality, three temporal Brain Sci. 2023, 13, 610 6 of 16 windows were selected for ITC analyses (−300 to −150 ms, 100-200 ms and 200-400 ms, respectively). In ERSP analysis, the temporal windows of −100 to −200 ms, 100-200 ms and 200-400 ms were selected for both the visual and auditory analyses. As to the frequency information, 1-3 Hz, 4-8 Hz, [8][9][10][11][12][13][14] Hz and 15-30 Hz were defined as frequency windows of delta, theta, alpha and beta band, respectively.

Statistical Analyses
In behavioral analysis, the paired-samples T test was used to make comparisons between visual and auditory tasks; one-way repeated-measures analysis of variance (ANOVA) was used for comparing reaction time and accuracy rate in NTP, MTP, and VTP conditions. EEG were analyzed by two-way repeated ANOVA. In visual modality, there were four separate hypotheses: there should be an MA-TP interaction on the visual (i) N1 amplitude, (ii) N2 amplitude, (iii) ITC, and (iv) ERSP. Similarly, there were three separate hypotheses: there should be an MA-TP interaction on the auditory (i) P2 amplitude, (ii) ITC, and (iii) ERSP. For each hypothesis, if the interactive effect did not exist, the main effect of modality attention and timing prediction would be tested, respectively. If the interactive effect existed, we then tested the simple effect of timing prediction under attended and unattended conditions, respectively. For each hypothesis, the Bonferroni method was used for multiple comparison test.

Behavioral Results
A total of 27 adults were included in the experiment, 14 of which were female. Reaction time and accuracy rate were analyzed to examine (i) whether the visual and auditory tasks led to similar behavioral results; (ii) whether TP manipulations were effective. Reaction time was defined as the period between button press and second stimulus onset. Accuracy rate was the ratio of correct trials with a reaction time less than 80 ms to the total trials. As Figure 2a shows, accuracy rates were high for both visual and auditory tasks (95.69% and 96.73%, respectively). Reaction times were 358.15 ms and 364.53 ms for visual and auditory tasks, respectively.
The TP was manipulated via the interactions between the predicted and actual onset moment. This means that if the manipulation was successful, subjects would behave differently even when faced with the identical stimuli. This study mainly investigated behavioral results induced by the TI400 actual stimuli, as other longer time intervals would face the problem of information disclosure and would not fully reflect the TP effects. As Figure 2b,e shows, in Va conditions, the reaction times were 291.95 ms, 442.28 ms and 606.58 ms, for NTP, MTP and VTP, respectively; accuracy rates were 99.9%, 98.2% and 91.2%. In Aa conditions, the reaction times were 300.40 ms, 435.48 ms and 566.44 ms for NTP, MTP, VTP, respectively; accuracy rates were 99.96%, 97.83% and 93.83. The NTP, which only had one choice rather than two, was the easiest mental task of all. Therefore, it is not surprising that the NTP had the shortest reaction times (Va: F(2,52) = 176.49, p < 0.001, NTP vs. MTP/VTP: p < 0.001, Aa: F(2,52) = 91.54, p < 0.001; NTP vs. MTP/VTP: p < 0.001, all after the Bonferroni correction), and the highest accuracy rates (Va: F(2,52) = 50.147, p < 0.001; NTP vs. MTP: p = 0.003; NTP vs. VTP: p < 0.001, Aa: F(2,52) = 16.353, p < 0.001; NTP vs. MTP: p = 0.007; NTP vs. VTP: p < 0.001, all after Bonferroni correction). Moreover, compared with VTP, the MTP condition had improved accuracy rates (visual: p < 0.001; auditory: p < 0.001) and reaction times (visual: p < 0.001; auditory: p < 0.001). This suggests that proper TP can improve behavioral performance. These results demonstrated that visual and auditory tasks had similar behavioral performances and confirmed the effectiveness of TP manipulations in both visual and auditory tasks. The TP was manipulated via the interactions between the predicted and actual onset moment. This means that if the manipulation was successful, subjects would behave differently even when faced with the identical stimuli. This study mainly investigated behavioral results induced by the TI400 actual stimuli, as other longer time intervals would face the problem of information disclosure and would not fully reflect the TP effects. As Figure 2b,e shows, in Va conditions, the reaction times were 291.95 ms, 442.28 ms and 606.58 ms, for NTP, MTP and VTP, respectively; accuracy rates were 99.9%, 98.2% and 91.2%. In Aa conditions, the reaction times were 300.40 ms, 435.48 ms and 566.44 ms for NTP, MTP, VTP, respectively; accuracy rates were 99.96%, 97.83% and 93.83. The NTP, which only had one choice rather than two, was the easiest mental task of all. Therefore, it is not surprising that the NTP had the shortest reaction times Moreover, compared with VTP, the MTP condition had improved accuracy rates (visual: p < 0.001; auditory: p < 0.001) and reaction times (visual: p < 0.001; auditory: p < 0.001). This suggests that proper TP can improve behavioral performance. These results demonstrated that visual and auditory tasks had similar behavioral performances and confirmed the effectiveness of TP manipulations in both visual and auditory tasks.

ERP Analyses
ERPs and topographies in visual modality, included six MA-TP conditions (Va-NTP, Va-MTP, Va-VTP, Vua-NTP, Vua-MTP and Vua-VTP). We first analyzed the N1 amplitude in ERP profiles induced by the first flash (Figure 3a,d). No interactive effect, TP or MA main effect were found. Following the second flash, the N1 and N2 components revealed striking variations across conditions, their topographies revealed differences as well (Figure 3c,g). In the N1 period, obvious attentional enhancement (F(1,26) = 19.190, p < 0.001, η 2 = 0.434) (Figure 3e) was observed. Moreover, in Va conditions, VTP led to much smaller N1 amplitude than the MTP (F(2,52) = 3.762, p = 0.035, MTP vs. VTP: p = 0.001, after Bonferroni correction), but no TP difference was found in Vua conditions. After N1, the attended NTP and MTP had an upward trend, whereas the attended VTP initially went down before rising again, which resulted in a more negative N2 component (F(2,52) = 4.087, p = 0.024, MTP vs. VTP: p = 0.001). However, in Vua condition, ERP profiles of the NTP, VTP and MTP were similar (Figure 3b).
ERPs and topographies in auditory modality can be seen in Figure 4 and included six MA-TP conditions: Aa−NTP, Aa−MTP, Aa−VTP, Aua−NTP, Aua−MTP, Aua−VTP. Auditory ERPs were mainly slow waves. By inspecting profiles, P2 component induced by the first and second beeps were selected for further analyses. Figure 4c shows there was no interactive effect, MA main effect or TP main effect after the first beep. Figure 4d was the comparisons of P2 induced by second beep. There was a significant interaction effect of the MA and TP (F(2,52) = 3.573, p = 0.035, η 2 = 0.125). In Aa condition the MTP response was much larger than that of NTP (p = 0.015 after Bonferroni correction), whereas the distinctions disappeared in Aua condition.
ERP analyses showed that in visual modality, there was an MA-TP interactive effect on the N1 and N2 amplitudes. The VTP resulted in smaller N1 and larger N2 only in Va condition. In auditory modality, there was an MA-TP interactive effect on P2 amplitude. The MTP led to a larger P2 component only in Aa condition.
Bonferroni correction), but no TP difference was found in Vua conditions. After N1, the attended NTP and MTP had an upward trend, whereas the attended VTP initially went down before rising again, which resulted in a more negative N2 component (F(2,52) = 4.087, p = 0.024, MTP vs. VTP: p = 0.001). However, in Vua condition, ERP profiles of the NTP, VTP and MTP were similar (Figure 3b).  Figure 4c shows there was no interactive effect, MA main effect or TP main effect after the first beep. Figure 4d was the comparisons of P2 induced by second beep. There was a significant interaction effect of the MA and TP in Aa conditions (F(2,52) = 3.573, p = 0.035, η = 0.125). The MTP response was much larger than that of NTP (p = 0.015 after Bonferroni correction), whereas the distinctions disappeared in Aua condition.      Within 400 ms after the second stimulus onset, TP caused theta ITC variations condition, and delta variations in Aa condition. However, the differences induced TP disappeared in both the Vua and Aua condition. This suggests that the neural ef TP only evident when the MA was allocated to the corresponding modality.  Within 400 ms after the second stimulus onset, TP caused theta ITC variations in Va condition, and delta variations in Aa condition. However, the differences induced by the TP disappeared in both the Vua and Aua condition. This suggests that the neural effect of TP only evident when the MA was allocated to the corresponding modality.   ERSP time-frequency distributions in auditory modality can be found in Figure 8. In the −200 to −100 ms period, auditory attention enhanced alpha ERSP (F(1,26) = 7.059, p = 0.013, η 2 p = 0.214). After the second beep, an increased ERSP first emerged, but showed no difference among conditions. Then, there was a beta suppression, which revealed a MA main effect (F(1,26) = 10.813, p = 0.003, η 2 p = 0.294). MA led to larger alpha ERSPs in both the visual and auditory modalities after the first stimulus onset. After the second stimulus, attention increased theta ERSP. Beta differences induced by the TP were only found in Va condition.

Discussion
This study investigated how the MA and precise single-interval TP modulated early sensory responses. We found in the visual modality, after the predicted moment (for example, the second flash), distinct TP conditions affected N1-N2 amplitude. Theta ITC differences were only observed in the Va condition, no difference was seen in Vua conditions. In the auditory modality, distinct TP conditions led to P2 amplitude variations and delta ITC differences only in the Aa condition, no difference was found in the Aua condition. These results suggest that the MA increased the TP-related response differences.

The MA-TP Mainly Modulated Low-Frequency ITCs in Early Sensory Processing
We first analyzed ERP signatures, including the N1-N2 component in visual modality, P2 component in auditory modality. The differences induced by distinct MA-TP conditions were small (which was measured by the parameter η 2 p ). A possible reason for this may be that the MA-TP changes were hidden by the low-frequency ERPs, which had very large amplitudes. Therefore, it is necessary to analyze neural activities in distinct frequency bands.
As expected, ITC time-frequency distributions had more obvious distinctions amongst the six MA-TP conditions in both the visual and auditory modalities. In the visual modality, alpha ITC was significantly increased after the first flash by the MA, but was not sensitive to TP modulation. Furthermore, the theta ITC was sensitive to both MA and TP modulations and there was a significant MA-TP interaction. Theta-band activity is traditionally related to specific cognitive controls [33], such as maintenance of working memory [34]; sustained attention [35]; shift of spatial attention [36]; and prediction errors management in perceptual learning [15,37]. A recent study reported that theta ITC is instrumental in shaping temporal predictions in early sensory processing [23]. For example, in rhythmic predictive timing process, phase-reset aligns stimulus and the ideal phase of delta-theta oscillation, which is correlated with following evoked ERPs [23]. In a time estimation task with rotating intervals, theta ITC in the frontal area was modulated by error magnitude, possibly indexing the degree of surprise [15]. The current study found that the theta ITC not only had a close association with the TP, but also reflected the interaction between the MA and TP, which went beyond the traditional role of theta ITC. Furthermore, previous studies proposed that the posterior theta is related to stimulus processing, but unaffected by task demands [31]. Although this study found specific theta changes could reflect top-down modulations in the posterior brain (O1, Oz, O2). Such observations suggest that the top-down modulation to theta activity can be observed not only in frontal area, but also in the primary visual cortex.
The auditory ITC is primarily located in delta and theta bands, of which the differences were in lower frequency bands than the visual responses. The delta phase synchronization has been widely accepted as a neural mechanism underlying rhythmic timing prediction [23], recent studies further demonstrated that delta phase works as neural mechanism of single-interval timing prediction as well [14]. Therefore, in the current single-interval precise TP cognitive process, it is reasonable to observe delta ITC changes among distinct TP conditions. ITC in the 200-400 ms period was concurrently modulated by the MA and TP. To the best of our knowledge, this may be novel neural evidence regarding the effect of delta ITC.
Additionally, the alpha and beta ERSP were affected by MA or TP. After the first stimulus onset, smaller alpha ERSP was found in both the Va and Aa conditions. Alpha oscillation has a key role in many mental tasks [38], especially the attention process [17,39]. Therefore, it is reasonable to observe MA-related alpha changes here. The beta ERSP was modulated by the TP in Va conditions. Many predictive timing studies have reported the suppression in post-stimulus beta power, and data suggest that it may have a key role in time maintenance or prediction error encoding [40][41][42][43]. This study found much smaller beta ERSP 200-400 ms after the second flash in Va conditions. This suggests that VTP may have a longer period for time maintenance or have larger prediction error encodings.

Visual MA Promoted the TP-Related Neural Effect in N1 and N2 Period
In visual modality, the NTP, MTP and VTP responses of single-interval precise TP were compared in Va and Vua conditions. In the 100-300 ms (N1) period of the Va condition, MTP had almost the same ERP and ITC performance as the NTP, which was the least influenced by top-down factors. VTP had a much smaller theta ITC than the MTP. In the 300-400 ms (N2) period, the VTP led to much larger negative waveforms than others. Allowing for the TP is a neural implementation of the predictive coding in time domain [23,24], N1 and N2 performance may be explained by the 'sharpen' and 'dampen' effects of predictive coding theory, respectively. According to the predictive coding, there are two types of neurons in a prediction process. One type of neuron encodes the information that is the same as the expected feature, the other type of neuron encodes the information that is different from expected (i.e., prediction error). The 'sharpen' effect proposed that the neurons which are not tuned to the expected information that is suppressed, making the expected features more salient and selective. The 'dampen' effect suggests that the information which is different from expected feature would result in larger responses, which was used for encoding prediction errors [44][45][46]. Correspondingly, the VTP condition, which represented unexpected information, was suppressed, whereas the MTP condition was not influenced, as it had almost the same ITC as the NTP. Such observations were consistent with the 'sharpen' effect. Regarding the N2 variations, VTP resulted in a more negative waveform than the MTP in Va conditions. Negative waveforms have been suggested as a neural representation of prediction error encodings [47,48], and the 'dampen' effect may explain this negative waveform. Therefore, in Va conditions, we found 'sharpened' N1 and 'dampened' N2 performance.
These results demonstrate that when MA existed in the visual modality, N1 was sharpened and N2 was dampened. This supports the results of our previous study. This was observed even when the brain was concurrently faced with visual-auditory stimuli. However, such N1-N2 performance disappeared when the MA was not attended in the visual modality, which suggests that the MA promoted the neural effects of the TP in the visual modality.

Auditory MA Also Promoted the TP-Related Neural Effect
We then investigated auditory neural responses. In Aa condition, the P2 component different in the VTP and MTP conditions. MTP went rapidly trended up 100 ms after the second beep. VTP had a more gradual upward trend, leading to a relatively negative waveform. This negative waveform may also be explained by the 'dampen' effect, which reflects the encodings of prediction error. However, in Aua condition, no significant difference was found, suggesting that the auditory MA promoted the TP neural effect.
There were some differences between the TP neural effect in visual and auditory modality. Compared with the visual results, auditory responses had clear TP-related differences in the lower frequency band. Visual responses revealed both the sharpen and dampen effects, but the auditory P2 only showed only the dampen effect. There are two potential explanations for this phenomenon. Early sensory processing may be essentially different when the brain is faced with visual and auditory responses. Alternatively, auditory ERP is primarily located in the frontal-central area, which is also the area neural circuits of predictive timing are found. This could mean that the neural signatures reflecting early auditory processing may be confused with the signals related to the top-down controls for higher cognitive processing. Therefore, it is necessary to investigate how to separate purely auditory response from the prediction-related variations in frontal area, if we want to have a better understanding of the neural effect of TP in auditory modality.

Conclusions
This study investigated how visual-auditory modality attention influences the neural effect of single-interval precise TP. We found in both the visual and auditory modality, the MA increased TP-related differences. Sharpened N1 and dampened N2 were only found in Va conditions, whereas dampened P2 was only found in Aa conditions. These results may provide new neural evidence for our understanding of the interactions between the MA and TP.